Nonlinear Feature Transformations for Noise Robust Speech Recognition

نویسنده

  • SHAJITH IKBAL
چکیده

Robustness against external noise is an important requirement for automatic speech recognition (ASR) systems, when it comes to deploying them for practical applications. This thesis proposes and evaluates new feature-based approaches for improving the ASR noise robustness. These approaches are based on nonlinear transformations that, when applied to the spectrum or feature, aim to emphasize the part of the speech that is relatively more invariant to noise and/or deemphasize the part that is more sensitive to noise. Spectral peaks constitute high signal-to-noise ratio part of the speech. Thus an efficient parameterization of the components only from the peak locations is expected to improve the noise robustness. An evaluation of this requires estimation of the peak locations. Two methods proposed in this thesis for the peak estimation task are: 1) frequency-based dynamic programming (DP) algorithm, that uses the spectral slope values of single time frame, and 2) HMM/ANN based algorithm, that uses distinct time-frequency (TF) patterns in the spectrogram (thus imposing temporal constraints during the peak estimation). The learning of the distinct TF patterns in an unsupervised manner makes the HMM/ANN based algorithm sensitive to energy fluctuations in the TF patterns, which is not the case with frequency-based DP algorithm. For an efficient parameterization of spectral components around the peak locations, parameters describing activity pattern (energy surface) within local TF patterns around the spectral peaks are computed and used as features. These features, referred to as spectro-temporal activity pattern (STAP) features, show improved noise robustness, however they are inferior to the standard features in clean speech. The main reason for this is the complete masking of the non-peak regions in the spectrum, which also carry significant information required for clean speech recognition. This leads to a development of a new approach that utilizes a soft-masking procedure instead of discarding the non-peak spectral components completely. In this approach, referred to as phase i

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه

Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

A Correlational Discriminant Approach to Feature Extraction for Robust Speech Recognition

A nonlinear discriminant analysis based approach to feature space dimensionality reduction in noise robust automatic speech recognition (ASR) is proposed. It utilizes a correlation based distance measure instead of the conventional Euclidean distance. The use of this ‘correlation preserving discriminant analysis’ (CPDA) procedure is motivated by evidence suggesting that correlation based cepstr...

متن کامل

Robust ASR front-end using spectral-based and discriminant features: experiments on the Aurora tasks

This paper describes an automatic speech recognition frontend that combines low-level robust ASR feature extraction techniques, and higher-level linear and non-linear feature transformations. The low-level algorithms use data-derived filters, mean and variance normalization of the feature vectors, and dropping of noise frames. The feature vectors are then linearly transformed using Principal Co...

متن کامل

Nonlinear Spectral Transformations for Robust Speech Recognition

Recently, a nonlinear transformation of autocorrelation coefficients named Phase AutoCorrelation (PAC) coefficients has been considered for feature extraction [1]. PAC based features show improved robustness to additive noise as a result of two operations, performed during the computation of PAC, namely energy normalization and inverse cosine transformation. In spite of the improved robustness ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004